As a Data Analyst, you will be responsible for developing and running workflows to standardize and load project datasets in a centralized environment. as well as therapeutic area-facing bioinformatics research scientists to leverage common data models, and support GRC Bioinformaticians’ needs for loading and querying. Your expertise in PostgreSQL for database management and Python and R for scripting and automation will be crucial in developing and maintaining ETL processes to ensure data quality and integrity.
Responsibilities
- Develop and maintain a functional understanding of the GRC common data models, loading processes, and requirements, and perform accurately and efficiently.
- loading of new and historical datasets into the GRC’s Omics Data Server.
- Collaborate with Bioinformatics Engineers to develop and implement additional data-loading workflows.
- Partner with Bioinformatics research scientists to identify, process, and load project data into the common data models.
- Build and execute ETL processes to integrate non-GRC generated high-value datasets into the common data models.
- Keep thorough documentation for tracking datasets and loading tasks.
- Ensure reproducibility and facilitate collaboration with team members by documenting and versioning code with git.
Qualifications
- Bachelor's degree in computer science, bioinformatics, or a related field.
- Experience with building and running workflows for RDMS data loading and ETL processes.
- Proficient in PostgreSQL (or equivalent) and has the ability to write complex queries for data extraction and analysis.
- Strong programming skills in Python for scripting and automation. Additional experience with R is preferred.
- Familiarity with genomic data formats and databases commonly used in bioinformatics research.
- Knowledge of data modeling concepts and implementing common data models in a relational database.
- Familiarity with data cleaning, normalization, and quality control processes.
- Excellent communication skills and ability to collaborate with researchers and stakeholders.